NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

BELIEF in dependence: Leveraging atomic linearity in data bits for rethinking generalized linear models

https://doi.org/10.1214/25-AOS2493

Brown, Benjamin; Zhang, Kai; Meng, Xiao-Li (June 2025, The Annals of Statistics)

Free, publicly-accessible full text available June 1, 2026
Six Maxims of Statistical Acumen for Astronomical Data Analysis

Tak, Hyungsuk; Chen, Yang; Kashyap, Vinay L; Mandel, Kaisey S; Meng, Xiao-Li; Siemiginowska; van_Dyk, David A (October 2024, The Astrophysical journal Supplement series)

The production of complex astronomical data is accelerating, especially with newer telescopes producing ever more large-scale surveys. The increased quantity, complexity, and variety of astronomical data demand a parallel increase in skill and sophistication in developing, deciding, and deploying statistical methods. Understanding limitations and appreciating nuances in statistical and machine learning methods and the reasoning behind them is essential for improving data-analytic proficiency and acumen. Aiming to facilitate such improvement in astronomy, we delineate cautionary tales in statistics via six maxims, with examples drawn from the astronomical literature. Inspired by the significant quality improvement in business and manufacturing processes by the routine adoption of Six Sigma, we hope the routine reflection on these Six Maxims will improve the quality of both data analysis and scientific findings in astronomy.
more » « less
Full Text Available
Six Maxims of Statistical Acumen for Astronomical Data Analysis

https://doi.org/10.3847/1538-4365/ad8440

Tak, Hyungsuk; Chen, Yang; Kashyap, Vinay_L; Mandel, Kaisey_S; Meng, Xiao-Li; Siemiginowska, Aneta; van_Dyk, David_A (November 2024, The Astrophysical Journal Supplement Series)

Abstract The acquisition of complex astronomical data is accelerating, especially with newer telescopes producing ever more large-scale surveys. The increased quantity, complexity, and variety of astronomical data demand a parallel increase in skill and sophistication in developing, deciding, and deploying statistical methods. Understanding limitations and appreciating nuances in statistical and machine learning methods and the reasoning behind them is essential for improving data-analytic proficiency and acumen. Aiming to facilitate such improvement in astronomy, we delineate cautionary tales in statistics via six maxims, with examples drawn from the astronomical literature. Inspired by the significant quality improvement in business and manufacturing processes by the routine adoption of Six Sigma, we hope the routine reflection on these six maxims will improve the quality of both data analysis and scientific findings in astronomy.
more » « less
Statistics and AI: A Fireside Conversation

https://doi.org/10.1162/99608f92.c066fe9c

Lin, Xihong; Cai, Tianxi; Donoho, David; Fu, Haoda; Ke, Tracy; Jin, Jiashun; Meng, Xiao-Li; Qu, Annie; Shi, Chengchun; Song, Peter; et al (January 2025, Harvard data science review)

Full Text Available
Six Statistical Senses

https://doi.org/10.1146/annurev-statistics-040220-015348

Craiu, Radu V.; Gong, Ruobin; Meng, Xiao-Li (March 2023, Annual Review of Statistics and Its Application)

This article proposes a set of categories, each one representing a particular distillation of important statistical ideas. Each category is labeled a “sense” because we think of these as essential in helping every statistical mind connect in constructive and insightful ways with statistical theory, methodologies, and computation, toward the ultimate goal of building statistical phronesis. The illustration of each sense with statistical principles and methods provides a sensical tour of the conceptual landscape of statistics, as a leading discipline in the data science ecosystem. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 10 is March 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
more » « less
Full Text Available
Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw your Kidstrogram

https://doi.org/10.51387/22-NEJSDS6

Meng, Xiao-Li (October 2022, The New England Journal of Statistics in Data Science)

This article expands upon my presentation to the panel on “The Radical Prescription for Change” at the 2017 ASA (American Statistical Association) symposium on A World Beyond $p<0.05$. It emphasizes that, to greatly enhance the reliability of—and hence public trust in—statistical and data scientific findings, we need to take a holistic approach. We need to lead by example, incentivize study quality, and inoculate future generations with profound appreciations for the world of uncertainty and the uncertainty world. The four “radical” proposals in the title—with all their inherent defects and trade-offs—are designed to provoke reactions and actions. First, research methodologies are trustworthy only if they deliver what they promise, even if this means that they have to be overly protective, a necessary trade-off for practicing quality-guaranteed statistics. This guiding principle may compel us to doubling variance in some situations, a strategy that also coincides with the call to raise the bar from $p<0.05$ to $p<0.005$ [3]. Second, teaching principled practicality or corner-cutting is a promising strategy to enhance the scientific community’s as well as the general public’s ability to spot—and hence to deter—flawed arguments or findings. A remarkable quick-and-dirty Bayes formula for rare events, which simply divides the prevalence by the sum of the prevalence and the false positive rate (or the total error rate), as featured by the popular radio show Car Talk, illustrates the effectiveness of this strategy. Third, it should be a routine mental exercise to put ourselves in the shoes of those who would be affected by our research finding, in order to combat the tendency of rushing to conclusions or overstating confidence in our findings. A pufferfish/selfish test can serve as an effective reminder, and can help to institute the mantra “Thou shalt not sell what thou refuseth to buy” as the most basic professional decency. Considering personal stakes in our statistical endeavors also points to the concept of behavioral statistics, in the spirit of behavioral economics. Fourth, the current mathematical education paradigm that puts “deterministic first, stochastic second” is likely responsible for the general difficulties with reasoning under uncertainty, a situation that can be improved by introducing the concept of histogram, or rather kidstogram, as early as the concept of counting.
more » « less
Full Text Available
Scalable Spike-and-Slab

Biswas, Niloy; Mackey, Lester; Meng, Xiao-Li (July 2022, PMLR)
Chaudhuri, Kamalika and (Ed.)
Spike-and-slab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties. However, existing samplers for spike-and-slab posteriors incur prohibitive computational costs when the number of variables is large. In this article, we propose Scalable Spike-and-Slab (S^3), a scalable Gibbs sampling implementation for high-dimensional Bayesian regression with the continuous spike-and-slab prior of George & McCulloch (1993). For a dataset with n observations and p covariates, S^3 has order max{n^2 p_t, np} computational cost at iteration t where p_t never exceeds the number of covariates switching spike-and-slab states between iterations t and t-1 of the Markov chain. This improves upon the order n^2 p per-iteration cost of state-of-the-art implementations as, typically, p_t is substantially smaller than p. We apply S^3 on synthetic and real-world datasets, demonstrating orders of magnitude speed-ups over existing exact samplers and significant gains in inferential quality over approximate samplers with comparable cost.
more » « less
Warp Bridge Sampling: The Next Generation

https://doi.org/10.1080/01621459.2020.1825447

Wang, Lazhi; Jones, David E.; Meng, Xiao-Li (April 2022, Journal of the American Statistical Association)

Full Text Available
Double Happiness: Enhancing the Coupled Gains of L-lag Coupling via Control Variates

https://doi.org/10.5705/ss.202020.0461

Craiu, Radu V.; Meng, Xiao-Li (January 2022, Statistica Sinica)

Full Text Available
Multiple Improvements of Multiple Imputation Likelihood Ratio Tests

https://doi.org/10.5705/ss.202019.0314

Chan, Kin Wai; Meng, Xiao-Li (January 2022, Statistica Sinica)

Full Text Available

« Prev Next »

Search for: All records